Goto

Collaborating Authors

 confidence band


Post-Selection Distributional Model Evaluation

Farzaneh, Amirmohammad, Simeone, Osvaldo

arXiv.org Machine Learning

Formal model evaluation methods typically certify that a model satisfies a prescribed target key performance indicator (KPI) level. However, in many applications, the relevant target KPI level may not be known a priori, and the user may instead wish to compare candidate models by analyzing the full trade-offs between performance and reliability achievable at test time by the models. This task, requiring the reliable estimate of the test-time KPI distributions, is made more complicated by the fact that the same data must often be used both to pre-select a subset of candidate models and to estimate their KPI distributions, causing a potential post-selection bias. In this work, we introduce post-selection distributional model evaluation (PS-DME), a general framework for statistically valid distributional model assessment after arbitrary data-dependent model pre-selection. Building on e-values, PS-DME controls post-selection false coverage rate (FCR) for the distributional KPI estimates and is proved to be more sample efficient than a baseline method based on sample splitting. Experiments on synthetic data, text-to-SQL decoding with large language models, and telecom network performance evaluation demonstrate that PS-DME enables reliable comparison of candidate configurations across a range of reliability levels, supporting the statistically reliable exploration of performance--reliability trade-offs.


Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models

Yining Wang, Xi Chen, Yuan Zhou

Neural Information Processing Systems

In this paper we consider the dynamic assortment selection problem under an uncapacitated multinomial-logit (MNL) model. By carefully analyzing a revenue potential function, we show that a trisection based algorithm achieves an item-independent regret bound of Op? T log log T q, which matches information theoretical lower bounds up to iterated logarithmic terms. Our proof technique draws tools from the unimodal/convex bandit literature as well as adaptive confidence parameters in minimax multi-armed bandit problems.


A Doubly Robust Machine Learning Approach for Disentangling Treatment Effect Heterogeneity with Functional Outcomes

Salmaso, Filippo, Testa, Lorenzo, Chiaromonte, Francesca

arXiv.org Machine Learning

Causal inference is paramount for understanding the effects of interventions, yet extracting personalized insights from increasingly complex data remains a significant challenge for modern machine learning. This is the case, in particular, when considering functional outcomes observed over a continuous domain (e.g., time, or space). Estimation of heterogeneous treatment effects, known as CATE, has emerged as a crucial tool for personalized decision-making, but existing meta-learning frameworks are largely limited to scalar outcomes, failing to provide satisfying results in scientific applications that leverage the rich, continuous information encoded in functional data. Here, we introduce FOCaL (Functional Outcome Causal Learning), a novel, doubly robust meta-learner specifically engineered to estimate a functional heterogeneous treatment effect (F-CATE). FOCaL integrates advanced functional regression techniques for both outcome modeling and functional pseudo-outcome reconstruction, thereby enabling the direct and robust estimation of F-CATE. We provide a rigorous theoretical derivation of FOCaL, demonstrate its performance and robustness compared to existing non-robust functional methods through comprehensive simulation studies, and illustrate its practical utility on diverse real-world functional datasets. FOCaL advances the capabilities of machine intelligence to infer nuanced, individualized causal effects from complex data, paving the way for more precise and trustworthy AI systems in personalized medicine, adaptive policy design, and fundamental scientific discovery.






Learning the score under shape constraints

Lewis, Rebecca M., Feng, Oliver Y., Reeve, Henry W. J., Xu, Min, Samworth, Richard J.

arXiv.org Machine Learning

Score estimation has recently emerged as a key modern statistical challenge, due to its pivotal role in generative modelling via diffusion models. Moreover, it is an essential ingredient in a new approach to linear regression via convex $M$-estimation, where the corresponding error densities are projected onto the log-concave class. Motivated by these applications, we study the minimax risk of score estimation with respect to squared $L^2(P_0)$-loss, where $P_0$ denotes an underlying log-concave distribution on $\mathbb{R}$. Such distributions have decreasing score functions, but on its own, this shape constraint is insufficient to guarantee a finite minimax risk. We therefore define subclasses of log-concave densities that capture two fundamental aspects of the estimation problem. First, we establish the crucial impact of tail behaviour on score estimation by determining the minimax rate over a class of log-concave densities whose score function exhibits controlled growth relative to the quantile levels. Second, we explore the interplay between smoothness and log-concavity by considering the class of log-concave densities with a scale restriction and a $(β,L)$-Hölder assumption on the log-density for some $β\in [1,2]$. We show that the minimax risk over this latter class is of order $L^{2/(2β+1)}n^{-β/(2β+1)}$ up to poly-logarithmic factors, where $n$ denotes the sample size. When $β< 2$, this rate is faster than could be obtained under either the shape constraint or the smoothness assumption alone. Our upper bounds are attained by a locally adaptive, multiscale estimator constructed from a uniform confidence band for the score function. This study highlights intriguing differences between the score estimation and density estimation problems over this shape-constrained class.


A General Approach to Visualizing Uncertainty in Statistical Graphics

Petek, Bernarda, Nabergoj, David, Štrumbelj, Erik

arXiv.org Artificial Intelligence

We present a general approach to visualizing uncertainty in static 2-D statistical graphics. If we treat a visualization as a function of its underlying quantities, uncertainty in those quantities induces a distribution over images. We show how to aggregate these images into a single visualization that represents the uncertainty. The approach can be viewed as a generalization of sample-based approaches that use overlay. Notably, standard representations, such as confidence intervals and bands, emerge with their usual coverage guarantees without being explicitly quantified or visualized. As a proof of concept, we implement our approach in the IID setting using resampling, provided as an open-source Python library. Because the approach operates directly on images, the user needs only to supply the data and the code for visualizing the quantities of interest without uncertainty. Through several examples, we show how both familiar and novel forms of uncertainty visualization can be created. The implementation is not only a practical validation of the underlying theory but also an immediately usable tool that can complement existing uncertainty-visualization libraries.


Asymptotic confidence bands for centered purely random forests

Neumeyer, Natalie, Rabe, Jan, Trabs, Mathias

arXiv.org Machine Learning

In a multivariate nonparametric regression setting we construct explicit asymptotic uniform confidence bands for centered purely random forests. Since the most popular example in this class of random forests, namely the uniformly centered purely random forests, is well known to suffer from suboptimal rates, we propose a new type of purely random forests, called the Ehrenfest centered purely random forests, which achieve minimax optimal rates. Our main confidence band theorem applies to both random forests. The proof is based on an interpretation of random forests as generalized U-Statistics together with a Gaussian approximation of the supremum of empirical processes. Our theoretical findings are illustrated in simulation examples.